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Abstract 

We propose a method for obtaining maximum likelihood estimates in a 
model with continuous and binary outcomes. Combinations of left and 
right censored observations are also naturally modeled in this framework. 
The model and estimation procedure has been implemented in the R package 
lava.tobit. 

The method is demonstrated on brain imaging and personality data 
where measurement error on predictor variables is handled in a latent vari¬ 
able framework. A simulation study is conducted comparing the small sam¬ 
ple properties of the MLE with a limited information estimator. 
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1. Introduction 


Correlated binary data appears in a wide range of applications in psy¬ 
chometrics and epidemiology, including questionnaire studies, longitudinal 
studies, and cross-sectional studies with multivariate measurements. De¬ 
pending on the application, different methods have been introduced to 
model such data. Sometimes marginal effects are of primary interest in 
epidemiology, in which case inference based on generalized estimating equa¬ 
tions (GEE) (Liang and Zeger, 1986) is preferable, and consistent estimates 
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and standard errors are obtained withont the need for explicit modeling of 
the variance strnctnre of the ontcomes. In scale validation stndies (based 
on Item Response Theory), and for certain types of matched analyses snch 
as matched case-control stndies, inference based on the conditional maxi- 
mnm likelihood (CML) (Andersen, 1971) offers several advantages dne to 
its compntational simplicity and the minimal reqnired assnmptions on the 
distribntion of the random effects. However, when the assnmptions for the 
CML are not fnlhlled, or when interest lies in conditional (snbject specihc) 
effects or qnantihcation of the actnal variance components, other methods 
are needed. 

A vast amonnt of literatnre and software has been written to deal with 
this sitnation. The main challenge here is that the likelihood fnnction is 
given as an integral with respect to the random effects and a nnmerical ap¬ 
proximation to this intractable integral is generally reqnired. Approximate 
integrated likelihood methods have been proposed via Adaptive Ganssian 
Qnadratnre (AGQ) (Pinheiro and Ghao, 2006), and varions variants of the 
Expectation-Maximization-algorithm snch as Monte Garlo EM (MGEM) 


(Wei and Tanner 

1990, McGnlloch, 

1997 

Song and Lee 

2005 

) and Stochas- 

tic Approximation EM (SAEM) (Meza and Fonlley 

2009 

). Also Bayesian 


methods have been popnlarized with the implementation of general Gibbs- 


samplers (Gilks et ah, 1994, Plnmmer, 2003). 


The dominating framework in applied statistics, however, seems to be 
the AGQ framework, which is implemented in most widely nsed software 
packages snch as stata GLLAMM (Rabe-Hesketh et ah, 2004), lme4 in R 


dBates et aH^OMl ), and SAS PROG NLMIXED ( |SAS Institnte Inc.i|2002-2005| ). 
In contrast most Monte Garlo methods are based on more experimental im¬ 
plementations, where ad hoc decisions on e.g. convergence has to be made. 
An attractive alternative is available via the limited information estimator 
proposed by Mnthen (Mnthen, 1984) which generalizes classical strnctnral 
eqnation models to allow for inclnsion of binary ontcomes modeled via a 
Probit link. 

Onr model is based on the same principles as (Mnthen, 1984). An eqniv- 
alent model formnlation is a threshold model where the responses are gener¬ 
ated by nnderlying normal latent variables. This leads to a generalization of 
tetrachoric correlations, where we allow conditioning on both covariates and 
random effects. Nnmerical approximations is here limited to evalnations of 
orthant probabilities of the mnltivariate normal distribntion and with com¬ 
pntational complexity that is independent of the nnmber of random effects 
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in the model. This is in contrast to the AGQ, where the complexity grows 
exponentially in the number of latent variables (the curse of dimensionality). 


2. The model 

Let Yij be a binary observation. The Probit model can be formulated 
as a threshold model (see Figure [^, where we assume the existence of an 
underlying conditionally normally distributed variable T)* such that 


Y = 

j- ij 


, > 0 
0, Y*^<0. 


( 1 ) 




Figure 1: Univariate threshold model with probability of event for the observed binary 
variable Y given by P(F = 1) = P(y* > r). For the bivariate case P(yi = 1,12 = 1) = 
FiY^* > > T2). 


We will let the underlying latent structure be described by a general 
Linear Latent Variable Model (LLVM), which can be translated into a mea- 
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surement part describing the latent responses, Y*j: 


Yj^j r'j “ 1 “ ^ ^ ^jk^ik “ 1 “ ^ ^ Kj-^Xir -\- ^ ^ ^jkYijkPik 


k=l 


r=l 


k=l 


( 2 ) 


+ • • • , Xiq) + eij, 


given covariates Xir^ Vijk, and a structural part describing the random effects 


Vis “1“ ^ ^ PskVik ^ ^ Vir^ir ^ ^ ^sfchhj, 


kVik 


r=l 


( 3 ) 


+ ■ ■ ■ ! ^iq) + Cik, 


with individuals i = 1,... ,n (between clusters observations), measurements 
j = 1,... ,p (within cluster observations) and covariates Wisk and random 
effects Vik^k = For continuous observations we simply let Yij = 

Y*j. Notice the interaction term djrVijrVir defines random slope components 
and differentiate the LLVM from the a classic Structural Equation Model 
formulation where the conditional variance of the responses given covariates 
is assumed to be constant between individuals. Also the functions and 
allows for non-linear effects of the covariates parameterized by 6. The 
random effects and residual terms e and ( are assumed to follow normal 
distributions, thus making the likelihood of the observed variables available 
in explicit form. We will in details describe how to evaluate the likelihood 
and score function of this model, and we will compare the performance 


of our method to the limited information estimator (Muthen, 1984). The 


method is implemented in the open-source R package lava.tobit (Holst 


2012 ) 


3. Estimation method 

In the following subsections we will derive the likelihood and the score 
function for a Gaussian latent variable model under censoring. The Probit 
model for dichotomous outcomes follows immediately as a special case. 


3.1. Right censored data 

We first derive a multivariate extension of the Tobit regression model for 
right censored normal distributed variables (Tobin, 1958), and we later show 
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that this model immediately extends to the more general latent variable 
framework. 

Let Y G be a multivariate censored response, i.e. there exists an 
underlying variable Y* and random censoring times C = (Ci,..., Cp)', 
such that 





1 ^* < C, 
Y* > C,. 


(4) 


For a realization of Y, we will partition the vector into y = {y'liy'o)' 
where y^ G M™' are fully observed and y^ G are actually censored 
components of Yg*- We will assume that Y* follows a normal distribu¬ 
tion N'{$, 0 ,Q>e)- In a common setup, we will also wish to condition on a 
set of covariates, (X, V, VY), but we will omit these here without loss of 
generality to avoid unnecessarily complex notation. 

Let p = (pi,... ,pfc)', and dehne the set 

k k 

= (^] - oo,pi] and ®p = (g)[pi, cx)[. (5) 

i=l i=l 

Now the likelihood contribution (with a somewhat sloppy notation w.r.t 
joint and conditional probability densities) of a single observation is given 
by; 


fe{.yi,Vo)= / (p 0 {yi,u)du = ipei^yi) / (pe{u\yi)du. ( 6 ) 

J®yo J®yo 

Obviously different individuals will typically have different patterns of cen¬ 
soring, i.e. the partitioning of Y will be different, and hence to calculate 
the log-likelihood of a complete data-set, summing over the log-likelihood 
contributions of the different patterns of censoring is necessary. 

In the following let and denote the density and CDF, respec¬ 
tively, of the multivariate normal distribution with mean p, and variance S 
(with (po,s = and <ho,s = 4>s-)- 
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In the case of no censoring the score is given by 



In the censored case, the derivative of the hrst term in the log-likelihood 
corresponding to Yi 


Si{0;y) = -^log^peivi) ( 8 ) 

is easily obtained from ([^ by extracting the relevant sub-matrices of and 
fig and their derivatives, and it follows that 

S{0-,y) = Si(0;s(i) + ^log'S>-„,,E,(-!/o) (9) 

where y,g and Hg are the mean and variance of the conditional distribution 
of Yg* given Yi = ?/i. 

We partition the mean and variance according to the (Yi, YJ^*); 

«-(f) $>)• 

hence 

( 12 ) 

i:g = flf^ ( 13 ) 


6 





and it follows that 


d vec He d vec 


( 0 ) 


dO' 


dO' 


+ 


{(yi 




, d vec Qn 

1 ‘ ® 


( 01 ) 


dO' 


- {(?/ 


e(i)yQ(i)-i 

1 “ 1 “fl 


o(io)o(i)-n dvecCl^g^ 

] QQ, 


QQ, 


( 1 ) 


( 14 ) 


dvecSff dvecfi 


( 0 ) 

6» 


dO' 


+ (n 

-d 


dO' 

(01)o(l)-l 




avecdg 


( 01 ) 


a6>' 

( 1 ) 


SI'' 
e ^‘‘e 


e e > QQ! 


(oi)o(i)-i 


sir'si 


9 vec 
90^ 


(15) 


To calculate the score, (|^, the derivative of the CDF with respect to 6 
is needed, and the result is given in Lemma below. We will need to define 


the operators Vu s : 


-)■ 


pfcxfc 


and A<^,s; 


-)■ 


as 


[A<M,s(2/)]i = / (p^,-E{x){xi-iJ.i)dx, 

JOy 

[Vy,,s{y)]ij = / - l^j) dx. 


(16) 

(17) 


Evaluating these quantities are obviously closely related to calculating the 
moments of a truncated normal distribution. We assume without loss of 
generality that S is a correlation matrix, and look at a standardized normal 
distribution W ~ A/'(0, S), and let T be the right-truncated version of VF, 


such that T is equal to W on the region ^y. From (Tallis, 1961) the moment 
generating function of T is 

m{t) = E(exp* ^) = a~^ exp{^t''St)^s{y — St), (18) 

with normalizing constant a = $x;(?/), and hence the first and second mo- 
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ments of T are 


E(r) = -a-‘Sr>45:(!/) 
E(TT') = S + 


(19) 

( 20 ) 


with D and H denoting the gradient and hessian operator. To calcnlate 
these qnantities, we can exploit that for the stochastic variables (f/, V) par¬ 
tial derivatives with respect to the hrst components of the corresponding 
CDF, F, is simply the prodnct of the marginal probability density fnnc- 
tion of U times the CDF of the conditional distribntion of V given U, 
duF{u,v) = f{u)F{v\u). As the conditional distribntions of a normal dis¬ 
tribntion is available in analytical form, the gradient and hessian of can 
explicitly be calcnlated as fnnctions of the CDF and pdf only. Details can be 
fonnd in (Tallis, 1961) and (Vaida and Lin, 2009, snpplementary). Fnrther, 
as fast and precise approximations are available for the evalnation of the 
mnltivariate normal CDF (Genz , [l992 i Genz et ah, 2014), the evalnation of 
the integrals (16) is an achievable task. 


Lemma 1. Assume has full rank and let Rg = be the corre¬ 

sponding correlation matrix where Ag is the diagonal-matrix with diag(A 0 ) = 
diag(Se)^/^. Then 


= -AgRgD^u i-^g^iy - Me)) 

+ AgRgH^R (^Ag^{y — y,g)) RgAg. 

Proof. Follows immediately by snbstitntion v = 'ip{u) = Aq^{u — /xg), not¬ 
ing that is scaled and translated bnt not rotated. □ 


Lemma 2. 


QQ ^iy) 


1 / dvec 

2 V 96' 

/'■V-l .o ■V'-l 


vec(S/)$^,,Se( 2 /) + 


Eg )vec{Vf,g,Sgiy)} 


+ 


dvecfie\'^ 

QQ, 1 vec{A4^,,Se(2/)} 
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Proof. By the fundamental theorem of calculus 


1 f d vec S/ 


/e^(2vr)^/2|s,|V2 2 


dO' 


X vec(Sg 

X exp (-|(w - ^le)'^'e^{u - ^ig)) du 


+ 


(27r)^/2 

X exp (-|(w - ^lg)'T,f^{u - ^Ig)) 


I f d vec S 

^ ^2 V dO' 
1-1 


X Vec(Sg (u — ^Ig)^ 'll 


-ll 


a vec 1 

86' J 


(w - /J,g)jdu, 


where we have used that 

A 


1 

1 , , 

1 / 

ISel 2 = 

— SJ 
2 ' 

n 


I f 8 vec S/ 


a 6 >' 


vec 


S»') 


and 


8 


— {{u - - ^le)) = 


8 vec 'Ef 


86' 


vec(Sg ^{u - Aie)®^S 


-ii 


avec^eV 1 

gg, I (U - /t») 


( 21 ) 


( 22 ) 


(23) 


derived by using standard matrix derivative results (Magnus and Neudecker 


1988). 


□ 
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3.2. Combinations of left and right censored data 

The extension to inclnde left censored variables follows almost immedi¬ 
ately from the right censored case. As before we let Y G be a mnltivari- 
ate censored response, with nnderlying normal response Y* 
defined from right censoring time points C = {Ci,..., Cp)' and left censor¬ 
ing time points C = (Ci,..., Cp)', snch that 

( Y*, Cj < Y* < 

= y:>Cj 

[c„ Yf<C,. 

Again we will partition an observation into y = ^g, y'o)', where y^ G 

are actnal right censored and yo G are actnal left censored, and we 
exploit that the conditional distribntion of the censored indices in Y given 
the fnlly observed indices is S^). 

We define the diagonal matrix 



L — li 


(25) 


snch that 


<P,.tg,Sg(Ul, U 2 ) duidu2 = 


dOyoX'Syo 

/ ^L/J.g,LSgL(Ul, U2) dUidU2, 

■^®yoX©—yo 

hence the log-likelihood is given by 

logcpffiyi) + log^LMg,LSgL ( 

\-yo 


(26) 


(27) 


which is identified as the likelihood of right-censored data, and the expres¬ 
sion for the score fnnction follows from the previons section, noting that 
when applying Lemma we nse that 


dvec LHqL 

M' 

5 vec LpL 0 


80 ' 


= V 
= L 


g,2 5vecSe 
80' 

8 vec y,e 

80 ' ■ 


(28) 

(29) 
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We have not explicitly covered the situation with random effects in 
the model (as in (|^) but the observed data likelihood follows immediately 
from the marginalization property of the normal distribution. Assume that 
y = {yiiVo) cire fully observed resp. right censored observation from an 
underlying normal model fe{Y, 77), with random effects r], then by Fubini’s 
theorem 


my) = 



f0{yi,u,r]) du dr] 
fe{yi,u,rj) dydu 


^yo 


fe(Vi,'u)du 


(30) 


^yo 


= fe{yi) / fe{u I 7 / 1 ) du. 

J®yo 

The last expression is identical to (|^ for which we have derived the 


score. 


3.3. Binary responses - the Probit model 

The multivariate Probit is trivially handled by noting that all observa¬ 
tions are either left censored at 0 (failure) or right censored at 0 (event). 
The probability of observing y = ( 7 / 1 ,..., pp) G {0,1}^ can therefore be ex¬ 
pressed as the orthant probability defined by the model-specific mean and 
variance 


P(W = y)= f 


dx. 


(31) 


In terms of ([^ this leads to the model 

l q 


F{Yij = 0 I Xi, Vi, r]i) = ^ XjkVik + KjrXi, 

k=l r=l 

I 

T ^ ^ djkVijkPik T Pq (Xji, . . . , Xj^)^ , 


(32) 


fc=l 


where $ is the CDF of the standard normal distribution. Hence, in this 
parameterization we are assuming that the residual terms of Y*j, Cjj, j = 
1,... ,p follow a standard normal distribution (not necessarily independent), 
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and the thresholds are fixed at zero as in Q. Other paranieterizations are 
possible, e.g. by constraining Var(l^*) = 1, and letting the intercepts of the 
model be zero. 


3.4- Implementation 

The maximnm likelihood estimates can be obtained via a ^-algorithm 


(Berndt et ah, 1974) or one of several generic optimization rontines available 


in R (R Development Core Team, 2015) (e.g the fnnction nlminb). Letting 
Si{6) denote the score contribntion of the ith individnal, we can obtain 
estimates of the standard errors of the estimates, 0, via the information 
matrix calcnlated as the onter prodnct of the score 


X„(6/) = 5^5,(6/)®2, (33) 

i=l 


also nsed in the RiL^-algorithm. 

Below we snmmarize the steps involved in calcnlating the score fnnction 


Algorithm 3.1 Calculating the score (9) 


For a single observation of responses tfi = {yn,.. .yim)' and covariates 




)', and given a set of parameter valnes, 6: 


1. a 

2. b 

3. C 


1.Identify dichotomons responses and redefine failnres as left censored 
at zero, and snccesses as right censored at zero. 


2.Identify index of observed variables, Xj and censored variables Xf, and 
let 111 iUij){j&Xi} and uq (2/p){jgxP}' 


3.Define matrix L as in (25), i.e. the diagonal matrix with -1 at entry 
i, if the ith coordinate of yo is right-censored, and ones on all other 
diagonal positions. 


d.Calcnlate model-specific conditional mean and variance Clg given 
covariates, and corresponding partial derivatives, and let V = 
and m' = 
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5.For {yiiyo) the underlying normal distribution of the fully observed 
process is partitioned as 


M 




Voi\ 

v-J 


(34) 


Calculate the marginal means, mi, mo, and variance, Vi, Vq, of yi, 
by extracting elements Xj. Similarly obtain matrix derivatives V/, Vq 
and m'i,mo by extracting the relevant rows from V and m'. 

6.Calculate 5'i(0), the score for ^i, as in equation Q based on Vi and 

mi 


7. Calculate the conditional mean and variance of the underlying normal 
distribution of yo given yi 


mo = mo + VmV„ '(y, - m,), 


(35) 


V„ = V„ - VoiV Vio, 


(36) 


and obtain the derivatives m'g and Vq as in (15)-(14). 

8. Calculate 

using the results of Lemma Q and ([^ with derivatives 

Q _ _ Q ^ ^ 

^imo = iinj and ^LVo-t = (i ® i)K- 


(37) 


9.Finally calculate the score contribution for the zth individual as 


S{6] yi, Xi, Vi) = Si{6) + S2{6). 


4. Estimation via composite marginal likelihood 

The computational bottleneck of the MLE method is the calculation of 
the derivatives of the normal CDF based on decomposing the distribution 
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function into all possible combinations of conditional distributions. For 
the hessian this leads to a computational complexity of order 0{k‘^). To 
remedy this problem in models with large fc, we propose to estimate model 
parameters via composite marginal likelihoods (Lindsay, 1998). We will 


compound over marginals dehned by the index /C, and dehne the composite 
marginal likelihood as 


= (38) 

keic 

and the corresponding composite score 

= (39) 

keK. 

where lAk is the score corresponding to the marginal log-likelihood We 

dehne 


X(0) = E(-VW(0;i/)) and J(0) = Var(W(0; ?/)). (40) 

As we essentially have a mis-specihed model, Bartlett’s identity no longer 
holds {X ^ J). Instead we can use the Godambe information 

g{e)=x{e)j{ey^x{e), (4i) 

and under general regularity conditions the composite likelihood maximizer 
is asymptotically normally distributed yn{6ci—d*) with 

6* minimizing the Kullback-Leibler divergence to the marginals of the true 
model distribution. 

In principle the second derivative in X can be calculated from the re¬ 
sults of the previous sections, or it can be approximated numerically. Also 
Bartlett’s identity holds for each marginal likelihood term, i.e. X can be 
calculated as 


n 

(42) 

i=l keK 

Similarly from the i.i.d. decomposition it follows that the empirical estima- 
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tor of J is 


n 


(43) 


2=1 


In the composite likelihood approach we can perform the calculations of the 
derivatives of the CDF for a hxed value corresponding to the size of the 
composite blocks. Using all pairs the growth in the data will be k{k — l)/2, 
and the asymptotic complexity is unchanged. Selecting only neighboring 
pairs, e.g. {(1,2), (2,3),...} will lead to a (!l(n)-algorithm. For consistency 
of the composite likelihood estimator, all composite blocks must be correctly 
specihed and the composition must embrace enough of the parameter space 
in order to identify the parameters of the full joint density. 

Generally the loss in efficiency might be compensated by a gain in com¬ 
putational robustness and efficiency. The proposed method of compounding 
the likelihood of adjacent variables, has been analyzed in a longitudinal set¬ 


ting (Joe and Lee, 2009) showing reasonable power for an auto-regressive 


correlation structure. In essence this is a limited information estimator but 


in contrast to (Muthen, 1984), we can in a natural way include incomplete 


observations in the analysis and make use of the increasing amount of infer¬ 


ential tools invented for composite likelihoods (see (Varin et al., 2011) for a 
recent overview). 


5. Usage 


The proposed method for estimation has been implemented in the R 


package lava.tobit ( 

Holst 

2012 

which acts as a plug-in to the package 

lava (Holst and Budtz-Jprgensen 

2013 

Holst, 

2015) (an implementation 


of the LLVM). 

As an example we will define a simple structural equation model 


r]i = 7iXi -F 72^2 -h Ci- (45) 

with i = l,...,n;j = 1,...,3, and eij ~ 7\^(0,ct|), (i ~ The 

model is visualized in the path diagram in Figure Using the lava-model 
syntax, this model can be dehned in R as 
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Figure 2: Path diagram of structural equation model defined by (441 and (45l. 


> library(lava.tobit) 

> m <- lvm() 

> regression(m) <- c(Yl ,Y2 ,Y3)~eta 

> latent(m) <- ~eta 

> regression(m) <- eta~Xl+X2 


# Initialize ’Ivm’ model 

# Measurement model 

# Define ’eta’ as random 

# Structural model 


Various methods for adding (non-linear) parameter constraints and co- 
variance between residual terms of the model exists. We refer to the man- 
pages of lava for further details. 

To simulate 500 observations from the above model, where we dichotomize 
I 2 and introduce hxed right censoring time at 1.5 for W, we write 


> set.seed(l) 

> d <- transform(sim(m, 500) , 

+ Y2=factor(Y2>0), 

+ Y3=Surv(ifelse(Y3<l.5,Y3,1.5),Y3<1.5)) 

with intercepts set to 0 and all other parameters 1 (the default parameter 
values of sim). To hnd the MLE, the estimate function is used, and since 
Y2 and Y3 were defined as a factor (with two levels) and as a Surv-object, 
respectively, the estimate method automatically calculates the likelihood 
and score function for these parts (Probit and Tobit, respectively) using the 
described method of Section 3. 
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> estimate(m,d) 



Estimate 

Std. Error 

Z value 

Pr(>|z|) 

Measurements: 

Y2<-eta 

0.8695611 

0.0913887 

9.5149756 

<le-16 

Y3<-eta 

1.0169726 

0.0472970 

21.5018283 

<le-16 

Regressions: 

eta<-Xl 

1.0039318 

0.0588643 

17.0550265 

<le-16 

eta<-X2 

1.0184532 

0.0605893 

16.8091257 

<le-16 

Intercepts: 

Y2 

-0.1033761 

0.0869258 

-1.1892456 

0.2343430 

Y3 

-0.0480167 

0.0688087 

-0.6978291 

0.4852841 

eta 

-0.0073254 

0.0662993 

-0.1104893 

0.9120214 

Residual Variances: 

Yl 

1.1241819 

0.1155639 

9.7277922 


Y3 

0.9106202 

0.0987181 

9.2244543 


eta 

1.1072682 

0.1221184 

9.0671677 



To demonstrate the composite likelihood method, we dichotomize Yl, 
and estimate the parameters of the model using the clprobit function; 


> d2 <- transformed, Yl=factor(Yl>0)) 

> clprobit(m,k=2,data=d2) 



Estimate 

Std. Error 

Z value 

Pr(>|z|) 

Measurements: 

Y2<-eta 

9.5979e-01 

1.8883e-01 

5.0828e+00 

3.7186e-07 

Y3<-eta 

9.2257e-01 

1.4058e-01 

6.5627e+00 

5.2835e-ll 

Regressions: 

eta<-Xl 

9.9233e-01 

1.4059e-01 

7.0583e+00 

1.6860e-12 

eta<-X2 

9.1147e-01 

3.1514e-01 

2.8922e+00 

3.8250e-03 

Intercepts: 

Y2 

-1.6560e-01 

1.0419e-01 

-1.5893e+00 

1.1199e-01 

Y3 

7.0581e-02 

9.0536e-02 

7.7959e-01 

4.3563e-01 

eta 

-1.2235e-01 

9.5063e-02 

-1.2870e+00 

1.9808e-01 

Residual Variances: 

Y3 

1.0816e+00 

1.5267e-01 

7.0847e+00 


eta 

1.0223e+00 

1.6114e-01 

6.3442e+00 
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In this example the block size was 2 and only adjacent variables were 
compounded, e.g. (hi, 1 ^ 2 ) and ( 12 ,^ 3 )- We refer to the help-page of 
clprobit on how to customize the blocks used in the composite likelihood 
estimation (as default continuous non-censored variables will not be split 
up). 


6. Simulation study 

In this section we will conduct a comparison between the limited infor¬ 
mation estimator (LIE) proposed in ( Muthen] |1984[ ) and MLE as obtained 
by our method. We will examine a model (see Figure]^ with a single binary 
outcome, Y, which follows a Probit model given a latent variable rj and a 
covariate X: 


$-1 (^p(y = 0 177, X)) = /i + + I32X 

The latent variable, 77 , is measured by 4 continuous variables 

= “3 + j = 


(46) 


(47) 


and A7(0,3)^), A7(0, fr^) are pairwise independent. We wish to 



Figure 3: Path diagram of structural equation model defined by and (47). 


estimate the effect of two covariates 77 and X on the binary outcome Y, 
which normally would be realizable within the Generalized Linear Model. 
However, we only have access to indirect measurements of 77 , Zi — Z 4 , and 
using one of these variables as predictor instead of 77 will yield a biased 
estimate of the effect of 77 , due to regression attenuation. The introduction 
of a measurement model remedies this. 
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We simulated n = 500 samples from the model with X A^(0,1), 
Xj = = 1, /X = ttj = 0, j = 1,..., 4, and the two parameters of 

primary interest /di = 1 and (32 = —0.5. The MLE and LIE were compared 


Table 1: Simulation study comparing MLE and LIE (|Muthen 
1984[ | based on 10,000 replications of 
model ([^ with = 0 | r],X)) = 

Variance Rel. Eff. 


sample-size 500 from the 
H + Pit] + I32X. 

Bias MSE 


SDiPsim) 


/i : 


MLE 

0.0094 

1 

0.001 

0.0094 

1.00 


LIE A) 

0.0098 

0.96 

0.001 

0.0098 

1.03 


LIE B) 

0.0099 

0.95 

0.001 

0.0099 

1.02 


LIE C) 

0.0100 

0.94 

0.001 

0.0100 

1.03 


LIE D) 

0.0098 

0.96 

0.001 

0.0098 

1.03 

/3i: 


MLE 

0.0128 

1 

0.020 

0.0132 

1.00 


LIE A) 

0.0139 

0.93 

0.023 

0.0144 

1.03 


LIE B) 

0.0146 

0.88 

0.025 

0.0152 

1.02 


LIE C) 

0.0188 

0.68 

0.030 

0.0197 

1.03 


LIE D) 

0.0139 

0.93 

0.023 

0.0144 

1.03 

(32-. 


MLE 

0.0077 

1 

-0.009 

0.0078 

0.99 


LIE A) 

0.0123 

0.63 

-0.009 

0.0123 

1.02 


LIE B) 

0.0087 

0.89 

-0.011 

0.0088 

1.01 


LIE C) 

0.0163 

0.47 

-0.015 

0.0165 

1.03 


LIE D) 

0.0083 

0.93 

-0.010 

0.0084 

1.01 


for model (46)-(47) (model A). We also calculated the LIE for the model 
(model D) where a direct effect of X on 77 was included (the parameter /3o 
in Figure a model where a covariance parameter between r] and x was 
included (model B), and a model where covariance between the rj and x was 
included but hxed to zero (model C). The results of the simulation study 
based on 10,000 replications are shown in Table 

As the MLE asymptotically dehnes the Cramer-Rao lower-bound, we 
calculated the relative efficiency as the ratio between the variance of the 
Monte Carlo parameter estimates of the MLE and the variance of the LIE 
estimates. For the intercept parameter, fi, all estimators performed well 
with practically no bias and a relative efficiency of the LIE around 0.95. 
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The LIE of model A and D had a comparable performance with a relative 
efficiency of 0.93 for the parameter /3i. Remarkably the LIE of model C 
performed signihcantly worse, even though it should be equivalent with 
model A. This might be an implementation artifact. With a sample-size 
of n = 500 we still saw a little bias for the parameter /?i (smallest for 
the MLE). For (^2 the bias of all estimators was acceptable, but to our 
surprise the LIE for the true model (A) performed much worse than model 
D (relative efficiency 0.93) and B (relative efficiency 0.89). 

The performance of the estimates of the standard errors of the param¬ 
eters was quantihed by calculating the ratio between the average standard 
error estimate and the standard deviation of the Monte Carlo parameter 
estimates. The standard error estimates generally performed acceptable for 
both the LIE and MLE (calculated via the outer product of the score). Per¬ 
haps there is a tendency towards the standard errors from the LIE being a 
little too conservative. 

Overall we can conclude that model D, where the structural model was 
modeled as rj = PqX yielded the best results for the LIE even under the 
model Po = 0. While clearly outperformed by the MLE, the loss in efficiency 
might for some applications be out-weighted by a gain in computational 
efficiency of the LIE. On the other hand, the lack of access to likelihood 
ratio testing, prohle likelihood conhdence intervals, the ability to handle 
incomplete observations, etc., might render this option less attractive. 


All simulations were conducted in Mplus version 5.1 (Muthen and Muthen 


2007 

) and lava.tobit version 0.4-7 ( 

Holst 

2012 


7. Application 

In this section we will demonstrate the modeling framework on brain 
imaging and personality data. Understanding the relationship between per¬ 
sonality and biological processes in the brain is recognized as being an im¬ 
portant step towards understanding the etiology of mental disorders such as 
schizophrenia and depression. Dysfunction of the serotonin (5-hydroxytryp- 
tamine (5-HT)) 5-HT2 a receptors is considered involved in the development 
of such disorders ( Williams et al.| 1996, Choi et ah, 2004), thus motivating 
the search for personality traits associated with the serotonergic system. 


Cloninger (Cloninger, 1987) developed the Temperament and Character 


Inventory (TCI) personality model with a biological interpretation derived 
from animal studies. The TCI scale is based on dichotomous questionnaires 
which are summarized into personality traits describing different patterns 
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of behavior such as Harm Avoidance, Reward Dependence and Novelty seek¬ 
ing. It was originally hypothesized that the neurotransmitter systems were 
related uniquely to specihc traits, for instance Harm avoidance and the 
serotonergic system. However, it has later been proposed that the seroton¬ 
ergic system could be related to regulation of other personality traits ([Paris 


2005) as suggested by both genetic and imaging studies (Ebstein et ah, 1997 


Goethals et al., 2007). The trait Reward Dependence {RD) characterizes 
how a person reacts to reward or punishment. It also seems that serotonin 
plays an active role for how people reacts towards reward. For example, an¬ 
imal studies have shown that decreased serotonin levels cause more frequent 


impulsive choices (Bizot et ah, 1999, Mobini et ah, 2000). 


In this study we therefore wanted to examine the relationship between 
measurements of the serotonergic system in the human brain and the trait 
Reward Dependence. A Danish translation of the self-administered TCI- 
R questionnaire (Cloninger, 1994) was answered by 170 subjects. Reward 
Dependence is traditionally quantihed as the sum of three sub-dimensions 
Sentimentality {RDi, 10 questions). Attachment {RD^, 8 questions), and 
Dependence {RD^, 6 questions). Each of the sub-dimensions are calculated 
as the sum of the related dichotomous questions. As a preliminary exercise, 
we examined the underlying factor structure that justihes the use of these 
summary statistics. The requirements of criterion related construct validity 
(Rosenbaum, 1989| ) are uni-dimensionality, monotonicity and local indepen¬ 
dence (conditional independence given the latent variable) as illustrated in 

m 



(48) 
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with 


$ 


-1 


P(Fi ^ 0|r/)^ ^ IM + Ail/. 


(49) 


Evidence against this model structure was tested with a likelihood ratio 
test against the saturated model with a free mean and correlation struc¬ 
ture. The factor analysis of RDi and RD^ indicated some problems with 
these scales and we therefore focused on RD 4 . One question (number 71) 
had to be removed from the analysis due to too little variation causing nu¬ 
merical instability. Three individuals had incomplete observations, which 
was handled in the MLE approach by ignoring the missing data mecha¬ 
nism under a missing at random assumption ([Little and Rubin 2002). The 


omnibus-test yielded a p-value of 0.58 thus not showing any evidence 
against the suggested factor structure. For a reliable scale, we would fur¬ 
ther expect all factor loadings to be equal (after appropriate coding of the 
questions). A likelihood ratio test against this model yielded a p-value of 
0.61. This suggests that RD 4 is a valid scale. The dimension Dependence 
has also previously been shown to be a valid and reliable scale in a sample of 
hospitalized and ambulant Danish psychiatric patients ( [Thorleifsson et ^ 
2010). In the subsequent analyses, we therefore used the suggested factor 


structure for RD 4 with equal factor loadings. 

Our aim was to quantify the association between central 5-HT function 
and the trait Dependence. Unfortunately it is not possible to measure brain 
5-HT levels directly in vivo. However, several studies have demonstrated 
a (negative) linear relationship between brain 5-HT levels and 5-HT2A re¬ 


ceptor binding (Gunther et ah, 2008, Cahir et al., 2007), suggesting that 


low 5-HT leads to compensatory up-regulation of 5-HT2A receptor bind¬ 
ing. We therefore proposed a measurement model where 5 -HT 2 A receptor 
binding potential measurements were perceived as indirect measurements 
of underlying global 5-HT level: 


— 5 -HT 2 AJ — P'j + ■ 5-HT -|- Cj. 


(50) 


5 -HT 2 A receptor binding potential potential (BPp) was quantihed by Positron 
Emission Tomography (PET) techniques and segmented into summary statis¬ 


tics of anatomically dehned regions (Svarer et al., 2005). For our analysis 


we chose three regions of interest which previously have been shown to yield 
reliable quantihcation of 5 -HT 2 A receptor binding: Superior frontal cortex 
(sfc). Anterior cingulate gyrus (acg), and Posterior cingulate gyrus (peg) 
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(see Figure]^. Effects of age and BMI have previously been demonstrated 
and was therefore added to the structural model (with BMI dichotomized 
as overweight (BMI>25) vs non-overweight); 

5-HT oc 5 -HT 2 A = 7i ■ Age -|- 72 ■ BMI -|- (51) 


We conducted an analysis of 56 subjects who completed the TCI question¬ 
naire and also underwent PET examination of the 5 -HT 2 A receptor binding 


potential. We refer to the paper (Erritzoe et ah, 2010) for a description of 


this PET sample and details on the methodology. The measurement model 
for the 5 -HT 2 A receptor measurements was combined with the measurement 
model of Dependence (see (48)). The two measurement models were linked 
with a regression between the two latent variables 


]E(i?Z)4 I 5-HT2 a) — (d ■ 5-HT2A; (^2) 

leading to the model described in the path-diagram of Figure 



Figure 4: Structural equation model describing the association between 5 -HT 2 A receptor 
binding potential and Reward Dependence. 
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The parameters of the model were estimated with MLE as described 
in Section In the analysis we had to remove one additional item from 
the measnrement model of RD 4 ^ (question number 131) to avoid numerical 
instability. 

The regression coefficient (3 is interpreted as the linear effect of the un¬ 
derlying 5 -HT 2 A receptor binding potential level on the latent personality 
trait Dependence. This interpretation becomes clearer using the standard¬ 
ized coefficient, which we estimated to a increase of 0.791 standard deviation 
in Dependence caused by increasing the 5 -HT 2 A receptor binding potential 
levels one standard deviation (p-value 0.04). Hence high 5 -HT 2 A receptor 
binding levels (low 5-HT levels) are associated with higher values of Depen¬ 
dence (more sensitive and socially dependent individuals). 



Latent 5-HT2A receptor binding levels 


Figure 5: Probabilities of answering yes to different questions of the Dependence per¬ 
sonality trait, with varying levels of latent 5-HT levels as measured by 5 -HT 2 A receptor 
binding potential. The x-axis is measured in standard deviations away from the mean 
(reference: non-overweight, age 36). 

The model also allows us estimate the conditional probabilities of the 
answers to the different questions given the 5-HT levels. For instance, for 
the first question in the RD^, we have 

P(Fi = 1 I 5 -HT 2 A) = $(0.42 + 1 ■ 0.425-HT2a). 

See Figure Similarly the joint probability of different combinations of 
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questions could be evaluated. Also, to obtain a better understanding of 
the 5-HT/5-HT2A receptor effect, we can calculate the probability ratio of 
answering positively to the hrst question, for a subject with average 5 -HT 2 A 
receptor levels and a subject with 5 -HT 2 A receptor levels two standard de¬ 
viations away from the average, 2 ■ SD(5-HT2 a | Age, BMI) = 0.96, 

P(yi = 0 I 5-HT = 0.96) _ 

P(lh = 0 I 5-HT = 0) 

indicating a 20% higher probability of answering positively to question 1 
for the subject with high levels of 5 -HT 2 A receptor binding potential (95% 
conhdence limits by the Delta method: [1.05; 1.34]). 

8. Conclusion 

Various methods for estimating parameters in models with correlated 
multivariate binary data has been proposed. Monte Carlo based meth¬ 
ods such as the MCEM algorithm and Bayesian methods are possible, but 
convergence can be slow and monitoring of the convergence of the Markov 
chains to the stationary distribution and assessment of the influence of prior 
distributions in the Bayesian context remains controversial topics. Deter¬ 
ministic integration such as AGQ also suffers from ad hoc decisions which 
must be made regarding the number of quadrature points to achieve a rea¬ 
sonable approximation. For complex models with many latent variables, it 
is well-known that this approach may break down in practice. 

Our method restricts itself to models with a Probit link, and we have 
demonstrated how this leads to an algorithm, which allows us to perform 
fast and precise evaluations of the score and likelihood function of the model. 
Implementation of generalizations to estimators based on score equations 
such as Inverse Probability Weighting should therefore follow immediately. 
In contrast for methods based on applying the Laplace approximation or 
AGQ where numerical derivatives of the integrated log-likelihood typically 
is used to approximate the score function, such generalizations could prove 
much more difficult. Our proposal is most well-suited for models with a 
moderate number of observed items and possible very complex latent struc¬ 
ture. However, the composite likelihood approach scales well with dimension 
in both number of items and latent variables, and may serve as an impor¬ 
tant framework for the estimation of complex structural equation models 
with non-normal response variables. 
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A strong feature of our model is the possibility of combining continuous, 
dichotomous and censored observations in a simultaneous model. This may 


serve as an important framework for the examination of causality (Ditlevsen 


et ah, 2005). 


The model has been implemented in full generality in the lava and 


lava.tobit packages (Holst 
r-project.org, 


2012) available freely on http://r-forge. 
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